Skip to content

Performance: parallel file I/O and optimized reading#44

Open
wojtek wants to merge 6 commits into
zeux:masterfrom
wojtek:performance-improvements
Open

Performance: parallel file I/O and optimized reading#44
wojtek wants to merge 6 commits into
zeux:masterfrom
wojtek:performance-improvements

Conversation

@wojtek

@wojtek wojtek commented Feb 3, 2026

Copy link
Copy Markdown

Summary

  • Add parallel file I/O with read-ahead buffering for index building
  • Use Windows FILE_FLAG_SEQUENTIAL_SCAN for better prefetching
  • Pre-allocate file buffers to avoid O(n²) reallocation
  • Minor optimizations: memchr for line finding, unordered_map for regex cache

Benchmark Results (UE5.6.1 Engine - 186k files, 2GB)

Scenario Baseline Improved Speedup
Cold Build 4.47s 2.72s 1.64x (39% faster)
Incremental 1.15s 1.14s ~1.0x (no change)

Incremental updates only scan metadata, so no improvement expected there.

Changes

  1. fileutil_win.cpp - readFileOptimized() with FILE_FLAG_SEQUENTIAL_SCAN
  2. build.cpp - Parallel read-ahead with multiple reader threads
  3. stringutil.hpp - Use memchr for faster line-end finding
  4. blockingqueue.hpp - notify_one instead of notify_all
  5. project.cpp - unordered_map for regex cache

Use Windows API directly with FILE_FLAG_SEQUENTIAL_SCAN hint for better
prefetching when reading files sequentially. This improves I/O throughput
during index building.

The POSIX implementation returns empty to allow fallback to standard file
reading.
Replace manual loop with memchr() which is typically optimized with
SIMD instructions for faster scanning.
Only one waiting producer needs to be woken when space becomes available,
reducing unnecessary thread wake-ups.
Replace std::map with std::unordered_map for O(1) average lookup
instead of O(log n) when caching compiled regex patterns.
- Pre-allocate file buffer using size hint to avoid O(n^2) reallocation
- Use readFileOptimized() with FILE_FLAG_SEQUENTIAL_SCAN on Windows
- Fallback to standard FileStream for non-Windows or special files
Use multiple reader threads to overlap file I/O with processing during
index building. Reader threads read ahead while the main thread consumes
files in order, improving throughput on systems with high I/O latency.

The number of reader threads scales with available CPU cores, and a
sliding window prevents readers from getting too far ahead.
@wojtek

wojtek commented Feb 3, 2026

Copy link
Copy Markdown
Author

COLD BUILD:
BASELINE: 7.96s, 4.45s, 4.48s → avg 4.47s (excl. cold cache)
IMPROVED: 2.68s, 2.66s, 2.83s → avg 2.72s
SPEEDUP: 1.64x (39% faster)

INCREMENTAL:
BASELINE: 1.14s, 1.16s, 1.16s → avg 1.15s
IMPROVED: 1.14s, 1.15s, 1.13s → avg 1.14s
SPEEDUP: ~1.0x (no change)

Tested on the following config:

qgrep config for Unreal Engine 5.6.1

path E:/UnrealEngine-5.6.1-release/Engine

include .(ini)$
include .(cpp|c|h|hpp|cc|inl)$
include .(ispc|isph)$
include .(cs|vb)$
include .(cmake)$
include .(java|js|kt|kts|ts|tsx)$
include .(md|rst|txt)$
include .(pl|py|pm|rb)$
include .(rs)$
include .(usf|ush|hlsl|glsl|cg|fx|cgfx)$
include .(xml|yml|yaml)$
include .(uplugin|uproject)$
include .(sh|bat)$
exclude ^DerivedDataCache/
exclude ^Intermediate/

186,283 files, 2GB input, 456MB index

@zeux

zeux commented Feb 26, 2026

Copy link
Copy Markdown
Owner

Is the main improvement due to read-ahead? The other changes (modulo some issues) would be easy to merge but that part is more unwieldy. So I'm wondering which changes are responsible for which speedups.

notify_all -> notify_one change is not always efficient. If queue reaches its maximum size and multiple threads wait on it, removal of a large item should wake all consumers, as otherwise the parallelism might be limited.

Changing the order of normalizeEOL & convertUTF8 is incorrect and would break UTF16 files as far as I can tell.

Some of the changes like map -> unordered_map, findLineEnd and sizeHint are no-brainers and could easily be merged if submitted separately.

zeux added a commit that referenced this pull request May 12, 2026
- Avoid reallocations during file reading via sizeHint
- Avoid vector and string copies in a few places using std::move
- Avoid vector reallocation when stripping UTF8 BOM
- (unrelated) Switch to unordered_map for regexCache

The UE5 source build is ~15% faster when running with hot FS cache as a
result (Linux/gcc).

Also fix the order of normalizeEOL vs convertToUTF8 which used to be
wrong so UTF16 files would incorrectly encode CRLF for newlines.

Thanks @wojtek for the suggestions (PR #44).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants